CroDeriV: a new resource for processing Croatian morphology

نویسندگان

  • Kresimir Sojat
  • Matea Srebacic
  • Marko Tadic
  • Tin Pavelic
چکیده

The paper deals with the processing of Croatian morphology and presents CroDeriV – a newly developed language resource that contains data about morphological structure and derivational relatedness of verbs in Croatian. In its present shape, CroDeriV contains 14 192 Croatian verbs. Verbs in CroDeriV are analyzed for morphemes and segmented into lexical, derivational and inflectional morphemes. The structure of CroDeriV enables the detection of verbal derivational families in Croatian as well as the distribution and frequency of particular affixes and lexical morphemes. Derivational families consist of a verbal base form and all prefixed or suffixed derivatives detected in available machine readable Croatian dictionaries and corpora. Language data structured in this way was further used for the expansion of other language resources for Croatian, such as Croatian WordNet and the Croatian Morphological Lexicon. Matching the data from CroDeriV on one side and Croatian WordNet and the Croatian Morphological Lexicon on the other resulted in significant enrichment of Croatian WordNet and enlargement of the Croatian Morphological Lexicon.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enlarging the Croatian WordNet with WN-Toolkit and Cro-Deriv

Wordnet is a standard semantic resource for several Natural Language Processing tasks and it is available for an increasing number of languages. The Croatian Wordnet (CroWN) was a relatively small resource with 10.026 synsets and 31.367 synset-variant pairs covering only 45.91% of the so-called Core WordNet. Comparing these figures with the size of the Princeton WordNet for English version 3.0,...

متن کامل

DerivBase.hr: A High-Coverage Derivational Morphology Resource for Croatian

Knowledge about derivational morphology has been proven useful for a number of natural language processing (NLP) tasks. We describe the construction and evaluation of DERIVBASE.HR, a large-coverage morphological resource for Croatian. DERIVBASE.HR groups 100k lemmas from web corpus hrWaC into 56k clusters of derivationally related lemmas, so-called derivational families. We focus on suffixal de...

متن کامل

A New Insight into Morphology of Solvent Resistant Nanofiltration (SRNF) Membranes: Image Processing Assisted Review

The aim of this review is to investigate the morphological properties of polyimide based SRNF membranes by mean of image processing. Effect of phase inversion parameters like polymer concentration, volatile co-solvent, pre-evaporation time, additives in coagulation bath, polymers weight ratio in composite membranes, addition of nano particles and cross-linking agents have been reviewed. The voi...

متن کامل

A NEW FUZZY MORPHOLOGY APPROACH BASED ON THE FUZZY-VALUED GENERALIZED DEMPSTER-SHAFER THEORY

In this paper, a new Fuzzy Morphology (FM) based on the GeneralizedDempster-Shafer Theory (GDST) is proposed. At first, in order to clarify the similarity ofdefinitions between Mathematical Morphology (MM) and Dempster-Shafer Theory (DST),dilation and erosion morphological operations are studied from a different viewpoint. Then,based on this similarity, a FM based on the GDST is proposed. Unlik...

متن کامل

CroDA: A CROATIAN DISCOURSE CORPUS OF SPEAKERS WITH APHASIA

The paper describes data collection and transcription to develop the Croatian discourse corpus of speakers with aphasia (CroDA), developed within the framework of the project Adult Language Processing (HRZZ-2421-UIP-11-2013) and available from 2017 as part of the AphasiaBank database of multimedia interactions for studying communication among speakers with aphasia. In accordance with the Aphasi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014